Many models are too complex for quadratic approximation or other short cuts.
We need a general method of numeric integration to explore the posterior effectivly.
MCMC \(\lim_{t\to\infty}\) posterior
Monte Carlo: “repeated random sampling to obtain numerical results” -wikipedia
Markov chain: stochastic model in which state at time \(t\) depends only on previous state at \(t-1\)
Imagine this one-dimensional posterior
Imagine this one-dimensional posterior:
Imagine we take a guess for \(\theta\) This is the first step of our (Markov) chain
We can then find the height for this guess
height = likelihood of data given \(\theta\) \(\times\) prior of \(\theta\)
= \(\text{Pr}(x|\theta) \times \text{Pr}(\theta)\).
Choose a proposal for a new value on the chain. - Our proposal distribution is centered on our last guess. - Can be any symmetric shape.
Find height (posterior-ish value) for our proposal
Calculate the ratio of those heights: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta)\text{Pr}(\theta)} = \frac{0.078}{0.07} = 1.124 \]
We want to calculate the ratio of posterior probabilities:
\[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t-1} | x)} = \frac{\frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x)}}{\frac{\text{Pr}(x|\theta_{t-1})\text{Pr}(\theta_{t-1})}{\text{Pr}(x)}} \] but we do not know (and cannot calculate) \(\text{Pr}(x)\).
However, they cancel out! We just need the numerators! \[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t-1} | x)} = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta)\text{Pr}(\theta)} \] ## Step 3 {.smaller}
Choose a random uniform variable, \(U \sim \text{Uniform}(0,1)\)
If \(r > U\), accept the proposal & add it to the chain
\[ r = \frac{0.061}{0.078} = 0.781 ; U = 0.89 \]
Since \(r < U\), reject proposal & keep the current position.
\[ r = \frac{0.079}{0.078} = 1.004 \] \(U=0.212\)
Accept proposal & add to chain
It works!
This is the Metropolis Algorithm
What if we get proposals outside of possible boundaries?
However, proposal distribution is \(a\)symmetric:
\[ \text{Pr}(\theta_{t-1} \rightarrow \theta_p) \neq \text{Pr}(\theta_{p} \rightarrow \theta_{t-1}) \]
Adjust the ratio to: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta_{t-1})\text{Pr}(\theta_{t-1})}\times \frac{\text{Pr}(\theta_{t-1}|\theta_p)}{\text{Pr}(\theta_p|\theta_{t-1})} \]
and it works!
Imagine a proposed jump from one of these points could easily be far from posterior density
Hamiltonian Monte Carlo
First, take the negative log, so high probability = low areas
First, take the negative log, so high probability = low areas
Take the first guess
…and give it a bit of momentum (in a direction of parameter space)
Track it’s movement for a certain amount of “time” using Hamiltonian equations
\[ \begin{align} \text{Total Energy} &= \text{Potential} + \text{Kinetic}\\ E &= U(\theta) + K(\text{momentum}) \end{align} \]
After a bit of time, point at end is the new proposal
No analytic solution to Hamiltonian for most problems, so:
No analytic solution to Hamiltonian for most problems, so:
Proposal acceptance actually reflects the difference in energy (i.e., how well we followed the path of the ball)
So unless there were divergences, we should accept
Overall, each proposal requires many more calculations (many steps, calculating kinetic energy & momentum at each), but proposals are much better / rarely rejected, so much more efficient overall